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PATENT 

ATTORNEY DOCKET NO: 50093/016001 

5 NOVEL BACTERIAL RNase P PROTEINS AND THETR USE IN 

IDENTIFYING ANTIBACTERIAL COMPOUNDS 

Background of the Invention 

This invention relates to novel bacterial ribonuclease P protein subunits and 
their use as targets in screening assays to identify compounds useful as 

10 antibacterial agents, 

Ribonuclease P (RNase P) is an endoribonuclease that cleaves the 5 f - 
terminal leader sequences of precursor tRNAs. RNase P has been characterized in 
representative a number of species. 

In bacteria, the structure of the RNase P holoenzyme is composed of a 

15 catalytic RNA subunit (350-450 nucleotides; encoded by the rnp B gene) and a 
single protein subunit (1 10-160 amino acids; encoded by the rnp A gene); both are 
essential for in vivo activity. In Escherichia coli (E. coli) the RNA subunit is 
termed Ml and the protein subunit is C5. The C5 protein engages in specific 
interactions with the Ml RNA to stabilize certain Ml RNA conformations. 

20 Through these interactions with Ml, C5 plays a critical role in the 
recognition/binding of some substrates. 

Comparison of RNase P protein subunits between bacterial species reveals 
that their primary structures have only a moderate degree of identity. For example, 
the protein subunits of Bacillus subtilis {B. subtilis) and E. coli are 30% identical. 

25 The functional significance of some conserved amino acid residues has been 

confirmed by mutagenesis studies, which have shown that these conserved amino 
acids play a significant role in the catalytic function of the RNase P holoenzyme. 



The tertiary structure of the RNase P protein subunit expressed in B. subtilis 
has been determined by X-ray crystallography. The overall topology of a -helices 
and (3 -sheets is al (31 (32 £3 a2 (34 a3, with an uncommon |33 a2 (34 cross-over 
connection that may confer specific functional consequences. Another functional 
5 aspect of the protein is the long loop connecting (32 to (33, termed the metal binding 
loop, which binds Zn 2+ ions and mediates interlattice contacts. In addition, the 
crystal structure reveals an overall fold that is similar to the ribosomal protein S5, 
translational elongation factor EF-G (domain IV), and DNA gyrase. 

Many pathogens exist for which there are few effective treatments and the 
10 number of strains resistant to available drugs is continually increasing. 

Accordingly, novel compositions and methods for assaying RNase P function 
would be useful for identifying antimicrobial compounds against these pathogens. 

Summary of the Invention 

Certain RNase P amino acid positions are markedly conserved, as revealed 
15 by comparing the protein subunit sequences using the ClustalW multiple alignment 
program indicating that the residues may be important in RNase P function. The 
invention features novel polypeptides related to the protein component of the 
RNase P holoenzyme in several pathogenic bacterial species, as well as the nucleic 
acid sequences which encode these proteins. The invention also features methods 
20 of using these sequences identify additional RNase P nucleic acids and proteins, 
and methods to screen for compounds which inhibit the RNase P function. Such 
compounds can be used as antibacterial agents. 

In the first aspect, the invention features an isolated polypeptide comprising 
an RNase P consensus sequence wherein said polypeptide has RNase P protein 
25 activity. In a preferred embodiment of this aspect, the polypeptide comprises an 
amino acid sequence selected from SEQ ID NOS: 20-38. 

In the second aspect, the invention features an isolated nucleic acid 
sequence, wherein the sequence encodes a polypeptide comprising an amino acid 
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sequence substantially identical to an amino acid sequence containing an RNase P 
consensus and has RNase P protein activity. In preferred embodiments, the 
sequence encodes a polypeptide comprising an amino acid sequence selected from 
SEQ ID NOS: 20-38 and/or the sequence is selected from SEQ ID NOS: 1-19. 
5 In the third aspect, the invention features a transgenic host cell including a 

heterologous nucleic acid sequence encoding the polypeptide of the first aspect of 
the invention. 

In the fourth aspect, the invention features an antibody that specifically 
binds to the polypeptide having SEQ ID NOS:20-38 of the first aspect of the 
10 invention. 

In the fifth aspect, the invention features a method of identifying an 
antibiotic agent, said method including: i) obtaining an RNase P holoenzyme 
comprising the polypeptide of the first aspect of the invention; ii) contacting the 
holoenzyme with an RNase P substrate in the presence and in the absence of a 

15 compound; and iii) measuring the enzymatic activity of the holoenzyme; wherein a 
compound is identified as an antibiotic agent if said compound produces a 
detectable decrease in said RNase P enzymatic activity as compared to activity in 
the absence of the compound. In various preferred embodiments, the polypeptide 
is substantially identical to a polypeptide of SEQ ID NOS:20-38, the activity is 

20 measured by fluorescence spectroscopy, the RNase substrate is fluorescently 
tagged ptRNA Gln , the fluorescence analysis is carried out in a buffer comprising 
10-40 mg/ml carbonic anhydrase and 10-100 /^g/ml polyC, or the buffer further 
includes at least one of the following: 0.5-5% glycerol; 10-100 /ug/m\ hen egg 
lysozyme; 10-50 ^g/ml tRNA; or 1-10 mM DTT. 

25 In the sixth aspect, the invention features a method for making a ptRNA Gln 

that includes annealing two RNA fragments together by heating to about 65 °C to 
about 80 °C for about 5 minutes, followed by cooling to 20-25° C. 

The term "nucleic acid" encompasses both RNA and DNA, including 
cDNA, genomic DNA, complementary antisense nucleic acids capable of 
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decreasing RNase P activity, and synthetic (e.g., chemically synthesized) DNA. 
The nucleic acid may be double-stranded or single-stranded. Where single- 
stranded, the nucleic acid may be a sense strand or an antisense strand. 

By "isolated nucleic acid" is meant a DNA or RNA that is separated from 
5 the coding sequences with which it is naturally contiguous (one on the 5' end and 
one on the 3' end) in the genome of the organism from which it is derived. Thus, 
in one embodiment, an isolated nucleic acid includes some or all of the 5 r and/or 3' 
non-coding (e.g., promoter) sequences which are immediately contiguous to the 
coding sequence. The term therefore includes, for example, a recombinant DNA 

10 which is incorporated into a vector, into an autonomously replicating plasmid or 
virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a 
separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or 
restriction endonuclease treatment) independent of other sequences. It also 
includes a recombinant DNA which is part of a hybrid gene encoding additional 

15 polypeptide sequence. 

By "isolated polypeptide" is meant a preparation which is at least 60% by 
weight (dry weight) the polypeptide of interest. Preferably the preparation is at 
least 75%, more preferably at least 90%, and most preferably at least 99%, by 
weight the polypeptide of interest. Purity can be measured by any appropriate 

20 standard method, e.g., column chromatography, polyacrylamide gel 
electrophoresis, or HPLC analysis. 

Moreover, an "isolated" nucleic acid or polypeptide is meant to include 
fragments which are not naturally occurring as fragments and would not be found 
in the natural state. 

25 By "a polypeptide containing RNase P activity" is meant a polypeptide 

sequence that, when combined with an RNA subunit to form an RNase P 
holoenzyme, has 20%, 50%, 75%, or even 100% or more, of the enzymatic activity 
of an E. coli or B. subtilis RNase P holoenzyme. Preferably, the RNA subunit is 
from the same species when activity is tested. The enzymatic activity can be 
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assessed, for example, by measuring hydrolysis of an RNase P substrate. Standard 
methods for conducting such hydrolysis assays are described herein and in the 
literature (see, e.g., Altaian and Kirsebom, Ribonuclease P, The RNA World, 2 nd 
Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1999; Pascual 
5 and Vioque, Proc. Natl. Acad. Sci. 96: 6672, 1999; Geurrier-Takada et al., Cell 35: 
849, 1983; Tallsjo and Kirsebom, Nucleic Acids Research 21: 51, 1993; Peck- 
Miller and Altaian, J. Mol. Biol. 221: 1, 1991; Gopalan et al., J. Mol. Biol. 267: 
818, 1997; and WO 99/11653). 

By "RNase P substrate" is meant a substrate in which hydrolysis by an 

10 RNase P holoenzyme requires the presence of the RNase P protein subunit. 

By "identity" is meant the relationship between two or more polypeptide 
sequences or two or more nucleic acid sequences, as determined by comparing the 
degree of sequence relatedness. "Identity" can be readily calculated by known 
methods, including but not limited to those described in Computational Molecular 

15 Biology, Lesk, A.M., ed., Oxford University Press, New York, 1988; 

Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., Academic 
Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, 
A.M., and Griffin, H.G., eds., Humana Press, New Jersey, 1994; Sequence 
Analysis in Molecular Biology, von Heinje, Academic Press, 1987; and Sequence 

20 Analysis Primer, Gribskov, and Devereux, eds., M. Stockton Press, New York, 
1991; and Carillo and Lipman, SIAM J. Applied Math. 48: 1073, 1988. 

Methods to determine identity are designed to give the largest match 
between the sequences tested. Moreover, methods to determine identity are 
available in publicly available computer programs. Computer program methods to 

25 determine identity between two sequences include, but are not limited to, the GCG 
program package (Devereux et al, Nucleic Acids Research 12(1): 387, 1984), 
BLASTP, BLASTN, and FASTA (Altschul et al., J. Mol. Biol. 215; 403 (1990). 
The well known Smith Waterman algorithm may also be used to determine 
identity. The BLAST program is publicly available from NCBI and other sources 
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(BLAST Manual, Altschul, et al, NCBI NLM NIH Bethesda, MD 20894). 
Searches can be performed in URLs such as the following 
http://www.ncbi.nlm.nih.gov/BLAST/unfinishedgenome.html; or 
http://www.tigr.org/cgi-bin/BlastSearch/blast.cgi. 
5 As an illustration of percent identity, if a test nucleic acid sequence (TN) 

has 95% identity to a reference nucleic acid sequence (RN) at the specified bases, 
then TN is identical to RN at the specified bases, except that TN may include point 
mutations in 5% of the total number of nucleic acids present in RN. Thus, 5% of 
nucleic acids found in RN may be deleted or substituted with another nucleic acid. 

10 In addition, the sequence of TN may contain, as compared to the specified RN 
bases, insertions of nucleic acids totaling up to 5% of the nucleic acids present in 
RN. These mutations, as compared to the RN sequence, may occur at the 5 ? or 3 ! 
terminal positions or anywhere between those terminal positions, interspersed 
either individually among the specified nucleic acids or in one or more contiguous 

15 groups of specified nucleic acids. As in the present invention, for nucleic acids 
encoding proteins, trinucleotide sequences encoding the same amino acid may 
optionally be treated as identical. 

Analogously, a test polypeptide (TP) has an amino acid sequence 95% 
identical to a reference amino acid sequence (RP) if TP is identical to RP at the 

20 specified amino acids, except that TP contains amino acid alterations totaling 5% 
of the total number of specified amino acids in RP. These alterations include 
deletions of amino acids or substitutions with one or more other specified amino 
acids. In addition, the alterations include insertions of other amino acids totaling 
up to 5% of the total amino acids present in the specified RP amino acids. The 

25 alterations in the TP amino acid sequence as compared to the RP sequence may 
occur at the amino or carboxy terminal positions, or anywhere between those 
terminal positions, interspersed either individually among residues or in one or 
more contiguous groups. 
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By "an RNase P consensus sequence" is meant a sequence which, when 
aligned to the E. coli RNase P sequence using the ClustalW program and 
performing a comparison of the specified amino acid sequences, shows 
conservation of at least nine of the following specified 20 amino acid residues in 
5 the E. coli RNase P protein subunit: Rl 1, L12, F18, R46, G48, V51, K53, K54, 
A59, V60, R62, N63, K66, R67, R70, L80, D84, V86, L101, and L105. 
Preferably, the consensus sequence conserves at least 13 of the 20 residues. It is 
also preferred that the aligned consensus sequence contain at least seven of the 
following subset of nine amino acid residues in the E. coli RNase P protein: Fl 8, 

10 R46, K53, A59, R62, N63, K66, R67, R70, more preferably, at least eight of the 
amino acids, and, most preferably, all nine amino acids of the above subset. For 
the purpose of determining identity in the present invention, identity of amino 
acids or other than those for which the amino acid is specified in the consensus 
sequence are ignored in the comparison when calculating identity of nucleic acids 

15 encoding an RNase P consensus sequence degenerate codons encoding the 
designated amino acid are treated as identical. 

The RNase P sequences claimed as part of the present invention specifically 
exclude those sequences in the RNase P database (James W. Brown, The 
Ribonuclease P Database, Nucleic Acids Research 27(1):314 (1999)) as posted on 

20 the internet on March 1, 2000. Also excluded are the RNase P polypeptide and 
nucleic acids described by nucleic acid or amino acid sequence in EP 081 1 688 A2 
{Staphylococcus aureus) and WO 99/11653 {S. pneumoniae). 

A "substantially identical" RNase P sequence is one which has or encodes a 
polypeptide having at least 95% identity, preferably 100% identity, to the twenty 

25 amino acids provided from the sequence of E. coli RNase P hereinbefore above. 

"Transformation" or "transfection" means any method for introducing 
foreign molecules, such as nucleic acids, into a cell. Lipofection, DEAE-dextran- 
mediated transfection, microinjection, protoplast fusion, calcium phosphate 
precipitation, retroviral delivery, electroporation, and biolistic transformation are 
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ju^t a few of the methods known to those skilled in the art which may be used. 
These techniques may be applied to the transformation or transfection of a wide 
variety of cell types and intact tissues including, without limitation, intracellular 
organelles (e.g., and mitochondria and chloroplasts), bacteria, yeast, fungi, algae, 
5 animal tissue, and cultured cells. 

By "transgenic host cell" means a cell (or a descendent of a cell) 
transformed or transfected with a heterologous nucleic acid sequence comprising a 
coding sequence operably linked to one or more sequence elements, e.g., a 
promoter, which directs transcription and/or translation such that the heterologous 

10 coding sequence is expressed in said host cell. The transgenic host cells may be 
either stably or transiently transfected. 

By "operably linked" is meant that a selected nucleic acid is positioned 
adjacent to one or more sequence elements, e.g., a promoter, which direct 
transcription and/or translation of the selected nucleic acid. 

15 By "specifically binds" is meant an antibody that recognizes and binds to 

the full length protein or subfragment of any one of SEQ ID NOS: 20-38, but 
which does not substantially recognize and bind to other molecules in a sample, 
including other RNase P proteins. 

Other features and advantages of the invention will be apparent from the 

20 following detailed description and from the claims. 

D escription of the Figures 

Fig. 1 shows the sequence alignment of previously known bacterial RNase 
P protein subunits using the ClustalW alignment program (Thompson et al., 
Nucleic Acids Research 22: 4673, 1994) and the alignment of the RNase P 
25 sequences of the present invention. The aligned fragments of the known RNase P 
sequences are designated by (*) and the aligned fragments of the RNase P 
sequences of the invention are designated by (#). 



-8- 



Figs. 2A-2S shows the nucleic acid sequences (SEQ ID NOs 1-19) 
encoding the amino acid sequences (SEQ ID NOs 20-38) of the bacterial RNase P 
polypeptides of the invention. The nucleic acid and amino acid sequences were 
derived from the following pathogenic bacterial species: Streptococcus mutans 
5 (Fig. 2A; SEQ ID NOs: 1 and 18, respectively); Klebsiella pneumoniae (Fig. 2B; 
SEQ ID NOs: 2 and 19, respectively); Salmonella paratyphi A (Fig. 2C; SEQ ID 
NOs: 3 and 20, respectively); Pseudomonas aeruginosa (Fig. 2D; SEQ ID NOs: 4 
and 21, respectively); Corynebacterium diphtheriae (Fig. 2E; SEQ ID NOs: 5 and 
22, respectively); Chlamydia trachomatis (Fig. 2F; SEQ ID NOs: 6 and 23, 

10 respectively); Vibrio cholerae Serotype 01, Biotype El Tor, Strain N16961 (Fig. 
2G; SEQ ID NOs: 7 and 24, respectively); Neisseria gonorrhoea FA 1090 (Fig. 
2H; SEQ ID NOs: 8 and 25, respectively); Neisseria meningitidis Serogroup A, 
Strain Z2491 (Fig. 21; SEQ ID NOs: 9 and 26, respectively); Streptococcus 
pyogenes Ml (Fig. 2 J; SEQ ID NOs: 10 and 27, respectively); Bordetella pertussis 

15 Tohama I (Fig. 2K; SEQ ID NOs: 1 1 and 28, respectively); Porphyromonas 

gingivalis W83 ( Fig. 2L; SEQ ID NOs: 12 and 29, respectively); Streptococcus 
pneumoniae Type 4 (Fig. 2M; SEQ ID NOs: 13 and 30, respectively); Clostridium 
difficile 630 (Fig. 2N; SEQ ID NOs: 14 and 31, respectively); Camphylobacter 
jejuni NCTC (Fig. 20; SEQ ID NOs: 15 and 32, respectively); Bacillus anthracis 

20 Ames (Fig. 2P; SEQ ID NOs: 16 and 33, respectively); Mycobacterium avium 104 
(Fig. 2Q; SEQ ID NOs: 17 and 34, respectively); Staphylococcus aureus NCTC 
8325 (Fig. 2R; SEQ ID NOs: 18 and 35, respectively); and Staplylococcus aureus 
COL (Fig. 2S; SEQ ID NOs: 19 and 36, respectively). 

Detailed Description 

25 The invention features novel polypeptides that form the protein component 

of the RNase P holoenzyme in several pathogenic bacterial species, as well as the 
nucleic acid sequences which encode these proteins. The invention also features 
methods of using these sequences to form the protein subunit of RNase P 
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holoenzymes to screen for compounds which inhibit the function of the 
holoenzymes. Such inhibitory compounds can be used as anti-bacterial agents. 
1. Identification of the Novel RNase P Protein Subimits 

The novel RNase P amino acid and nucleic acid sequences were discovered 
5 using the following strategy. First, the genomic databases of several pathogenic 
bacteria were searched using the BLAST program (Altschul et al., J. Mol. Bio. 
215: 403, 1990) and known RNase P polypeptide sequences from E. coli (gram- 
negative) and J5. subtilis (gram-positive) as "query" sequences. Given that the 
largest number of known RNase P protein subunit sequences correspond to 
10 sequences from gram-negative and gram-positive bacteria, "query" sequences 
from both bacterial groups were used in the search to ensure that all novel 
sequences having homology to known RNase P sequences would be identified. 

BLAST searches of genomic databases for potential RNase P homologues 
were performed in the following URLs: http://www.ncbi.nlm.nih.gov/BLAST/ 
1 5 unfinishedgenome. html; and http://www.tigr.org/cgi-bin/BlastSearch/blast.cgi. 

The BLAST program only considered hits with a P-value of less than or 
equal to 10" 5 to ensure that random hits were not sampled. 

The above-described searches often yielded multiple hits in the genomic 
databases. To identify which sequences were genuine RNase P protein subunits, 
20 we determined whether the sequences also contained an RNase P consensus 
sequence, which we defined as a sequence that, upon alignment with known 
RNase P sequences using the ClustalW program, conserves at least nine of the 
following twenty amino acids in the E. coli RNase P protein sequence: Rl 1, L12, 
F18, R46, G48, V51, K53, K54, A59, V60, R62, N63, K66, R67, R70, L80, D84, 
25 V86, L101, and L105. Preferred sequences contained at least thirteen out of the 
twenty residues and/or had at least seven of the following amino acid subset: F18, 
R46, K53, A59, R62, N63, K66, R67, and R70. 

This RNase P consensus sequence was derived as follows. We aligned the 
sequences of the known bacterial RNase P protein subunits using the ClustalW 
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alignment program (Thompson et al., supra) (see Fig. 1, the previously known 
RNase P sequences are designated by (*) and were obtained from the RNase P 
database; www.jwbrown.mbio.ncsu.edu/rnasp/home.html.) This ClustalW 
alignment was then manually refined to align highly conserved RNase P 
5 hydrophobic and basic residues that had been demonstrated by mutation studies to 
be important for RNase P catalytic function (Gopalan et al., J. MoL Biol. 267: 818, 
1997). The spacing between the conserved residues, as well as the identity of the 
individual residues, appears critical to RNase P function. 

Twenty amino acids were identified as highly conserved (shown as the 

10 shaded residues in Fig. 1). The percent of RNase P sequences which conserve 
each of the shaded residues is shown below the sequence information as percent 
identity. Based upon these known sequences, we determined that a polypeptide 
identified by our above-described RNase P BLAST search contained an RNase 
consensus sequence and was a genuine RNase P protein subunit if it contained at 

1 5 least nine of the above-described twenty amino acids. Preferred polypeptides have 
a consensus sequence with at least 13 of the amino acids and/or conserve at least 
seven of the following subset of amino acids: F18, R46, K53, A59, R62, N63, 
K66, R67, and R70. This subset of amino acids is preferred because it has been 
identified as playing a significant role in RNase P function through mutation 

20 studies (Gopalan et al, J. MoL Biol. 267: 818 1997) and the determination of the 
RNase P three dimensional structure (Stams et al., Science 280: 752, 1998). As 
shown in Fig. 2, the three dimensional structure reveals that all of the residues that 
make up the above-described nine amino acid subset are proximal to each other in 
the tertiary structure of the protein, despite the distance between some of the 

25 residues in the primary structure. 

2, RNase P Protein Amino Acid and Nucleic Acid Sequences 

The novel RNase P proteins of the invention, and the nucleic acid 
sequences which encode the proteins, are derived from the following bacterial 
species: Streptococcus mutans UAB159; Klebsiella pneumoniae M6H 78578; 
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Salmonella paratyphi A (ATCC 9150); Pseudomonas aeruginosa PAOl; 
Corynebacterium diphtherial, Chlamydia trachomatis MoPn; Vibrio cholerae 
Serotype 01, Biotype El Tor, Strain N16961; Neisseria gonorrhoea FA 1090; 
Neisseria meningitidis Serogroup A, Strain Z2491; Streptococcus pyogenes Ml; 
5 Bordetella pertussis Tohama I; Porphyromonas gingivalis W83; Streptococcus 
pneumoniae Type 4; Clostridium difficile 630; Camphylobacter jejuni NCTC; 
Bacillus anthracis Ames; Mycobacterium avium 104. Staphylococcus aureus 
NCTC 8325; and Staplylococcus aureus COL. The sequences are shown in Fig. 2. 
All of the novel RNase P protein sequences were identified by the above- 

10 described BLAST search. The alignment of these sequences with the known 

RNase P sequences is also shown in Fig. 1 (the RNase P sequences of the present 
invention are designated by (#)). This alignment demonstrates that the amino acid 
sequences of the invention all contain RNase P consensus sequences. Therefore, 
these polypeptides are genuine RNase P proteins. 

1 5 The RNase P identification is further supported by the protein structure of 

the polypeptides of the invention, as determined by SWISS-MODEL. The SWISS 
MODEL is an automated protein modelling server running at the Glaxo Wellcome 
Experimental Research in Geneva, Switzerland 

(http://www.expasy.ch/swissmod/swiss.model). The polypeptide sequences of the 
20 invention were readily folded (at least in part) into the tertiary structure of the B. 
subtilis RNase P protein subunit (Stams et al., supra). It is noteworthy that 
conserved residues in the newly identified sequences are modeled into positions 
which are spatially and structurally identical to the RNase P protein subunit of B. 
subtilis. 

25 Further support for the RNase P identification is as follows. Using the 

above-described BLAST search and consensus sequence determination, we 
independently identified the sequence for an RNase P protein subunit from the 
genomic database of Staphylococcus aureus (S. aureus). Although this sequence 
had been previously identified as an RNase P protein subunit and its RNase P 
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activity had been confirmed by assay (EPA 0 811 688 A2), our independent 
discovery of this RNase P sequence provides proof of principle that our method of 
searching for RNase P protein subunits predictably identifies polypeptides that 
have RNase P activity. 
5 The invention features purified or isolated RNase P protein subunits. As 

used herein, both "protein" and "polypeptide" mean any chain of amino acids, 
regardless of length or post-translational modification (e.g., glycosylation or 
phosphorylation). Thus, the term RNase P protein subunit includes full-length, 
naturally-occurring RNase P proteins, preproteins, and proproteins, as well as 

10 recombinantly or synthetically produced polypeptides that correspond to full- 
length, naturally-occurring RNase P proteins or to particular domains or portions 
of naturally-occurring proteins. These proteins are produced using standard 
techniques (see, e.g., Ausubel et al., Current Protocols in Molecular Biology, John 
Wiley & Sons, New York, 1995; Pouwels et al, Cloning Vectors: A Laboratory 

15 Manual, 1985 (1987 Suppl.); and Sambrook et al., Molecular Cloning, A 

Laboratory Manual, 2 nd ed., Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, NY, 1989). 

Preferred RNase P proteins include a sequence substantially identical to all 
or a portion of a naturally occurring RNase P protein subunit, e.g., including all or 

20 a portion of any of the sequences shown in Fig. 2 (SEQ ID NOS: 20-38). 

In the case of polypeptide sequences which are less than 100% identical to a 
reference sequence, the non-identical positions are preferably, but not necessarily, 
conservative substitutions for the reference sequence. Conservative substitutions 
typically include substitutions within the following groups: glycine and alanine; 

25 valine, isoleucine, and leucine; aspartic acid and glutamic acid; asparagine and 
glutamine; serine and threonine; lysine and arginine; and phenylalanine and 
tyrosine. 

Preferred polypeptides are those which are soluble under normal 
physiological conditions. Also within the invention are soluble fusion proteins in 
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which a full-length or subfragment of RNase P protein (e.g., one or more domains) 
is fused to an unrelated protein or polypeptide (i.e., a fusion partner) to create a 
fusion protein. 

Structurally related RNase P polypeptides of the invention include, but are 
5 not limited to, polypeptides with additions or substitutions of amino acid residues 
within the amino acid sequence encoded by the RNase P nucleic acid sequences 
described herein these changes result in a silent change, thus producing a 
functionally equivalent gene product. Amino acid substitutions may be made on 
the basis of similarity in polarity, charge, solubility, hydrophobicity, 

10 hydrophilicity, and/or the amphipathic nature of the residues involved. For 

example, nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, 
valine, proline, phenylalanine, tryptophan, and methionine; polar neutral amino 
acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine; positively charged (basic) amino acids include arginine, lysine, and 

15 histidine; and negatively charged (acidic) amino acids include aspartic acid and 
glutamic acid. 

Preferred RNase P polypeptides and variants have 20%, 50%, 75%, 90%, or 
even 100% or more of the activity of one of the bacterial RNase P proteins of SEQ 
ID NOS; 20-38 shown in Fig. 2. Such comparisons are generally based on equal 

20 concentrations of the molecules being compared. The comparison can also be 
based on the amount of protein or polypeptide required to reach the maximal 
activation obtainable. 

In general, RNase P proteins according to the invention can be produced by 
transformation (transfection, transduction, or infection) of a host cell with all or 

25 part of a RNase P-encoding nucleic acid sequence of the present invention in a 
suitable expression vehicle. Such expression vehicles include: plasmids, viral 
particles, and phage. For insect cells, baculovirus expression vectors are suitable. 
The entire expression vehicle, or a part thereof, can be integrated into the host cell 
genome. In some circumstances, it is desirable to employ an inducible expression 
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vector, e.g., the LACSWITCH™ Inducible Expression System (Stratagen, LaJolla, 
CA). 

Those skilled in the field of molecular biology will understand that any of a 
wide variety of expression systems can be used to provide the recombinant protein 
5 (see, e.g., Ausubel et al., Current Protocols in Molecular Biology, John Wiley & 
Sons, New York, 1995; Pouwels et al., Cloning Vectors: A Laboratory Manual, 
1985 (1987 Suppl.); and Sambrook et al., Molecular Cloning, A Laboratory 
Manual, 2 nd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 
1989). The precise host cell used is not critical to the invention. TheRNaseP 
10 protein can be produced in a prokaryotic host (e.g., E. coli or B. subtilis) or in a 
eukaryotic host (e.g., Saccharomyces or Pichia; mammalian cells, e.g., COS, NIH 
3T3 CHO, BHK, 293, or HeLa cells; or insect cells; or plant cells). 

The host cells harboring the expression vehicle can be cultured in 
conventional nutrient media adapted as needed for activation of a chosen gene, 
15 repression of a chosen gene, selection of transformants, or amplification of a 
chosen gene. 

RNase P proteins can be produced as fusion proteins. For example, the 
expression vector pUR278 (Ruther et al, EMBO J. 2: 1791, 1983), can be used to 
create lacZ fusion proteins. The pGEX vectors can be used to express foreign 

20 polypeptides as fusion proteins with glutathione S-transferase (GST). In general, 
such fusion proteins are soluble and can be easily purified from lysed cells by 
adsorption to glutathione-agarose beads followed by elution in the presence of free 
glutathione. The pGEX vectors are designed to include thrombin or factor Xa 
protease cleavage sites so that the cloned target gene product can be released from 

25 the GST moiety. 

The invention also features the isolated nucleic acid sequences of SEQ ID 
NOS: 1-19 shown in Fig. 2, and nucleic acid sequences that encode one or more 
portions or domains of an RNase P protein subunit, including but not limited to the 
al, a2, a3, pi, (32, |33, and p4 portions of the protein. 
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Preferred nucleic acids encode polypeptides that are soluble under normal 
physiological conditions. Also within the invention are nucleic acids encoding 
fusion proteins in which the whole RNase P protein or a sub-fragment is fused to 
an unrelated protein or polypeptide (e.g., a marker polypeptide or a fusion partner) 
5 to create a fusion protein. For example, the polypeptide can be fused to a hexa- 
histidine tag to facilitate purification of bacterially expressed protein, or to a 
hemagglutinin tag to facilitate purification of protein expressed in eukaryotic cells. 

The fusion partner can be, for example, a polypeptide which facilitates 
secretion, e.g., a secretory sequence. Such a fused protein is typically referred to 

10 as a preprotein. The secretory sequence can be cleaved by the host cell to form the 
mature protein. Also within the invention are nucleic acids that encode mature 
RNase P protein fused to a polypeptide sequence to produce an inactive 
proprotein. Proproteins can be converted into the active form of the protein by 
removal of the inactivating sequence. 

15 The nucleic acids of the invention further include sequences that hybridize, 

e.g., under high stringency hybridization conditions (as defined herein), to all or a 
portion of the nucleic sequence of any one of SEQ ID NOS: 1-19, or any of their 
complements. As used herein, high stringency conditions include hybridizing at 
68 °C in 5x SSC/5x Denhardt solution/1.0% SDS, or in 0.5 M NaHP04 (pH 7.2)/ 

20 ImM EDTA/7% SDS, or in 50% formamide/0.25 M NaHP04 (pH 7.2)/0.25 M 
NaCl/1 mM EDTA/7% SDS; and washing in 0.2x SSC/0.1% SDS at room 
temperature or at 42°C, or in O.lx SSC/0.1% SDS at 68°C, or in 40 mM NaHP04 
(pH 7.2)/l mM EDTA/5% SDS at 50 °C, or in 40 mM NaHP04 (ph 7.2)/ 1 mM 
EDTA/1% SDS at 50 °C. The parameters of salt concentration and temperature 

25 can be varied to achieve the desired level of identity between the probe and the 

target nucleic acid. Further guidance regarding hybridizing conditions is provided, 
for example, in Sambrook et al, Molecular Cloning, A Laboratory Manual, Cold 
Springs Harbor Press, NY, 1989; Ausubel et al., Current Protocols in Molecular 
Biology, John Wiley & Sons, NY, 1995). 
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The hybridizing portion of the hybridizing nucleic acids are preferably 20, 
30, 50, or 70 bases long. Preferably, the hybridizing portion of the hybridizing 
nucleic acid is 80%, more preferably 95%, or even 98% identical, to the sequence 
of a portion or all of a nucleic acid encoding an RNase P protein subunit. 
5 Hybridizing nucleic acids of the type described above can be used as a cloning 
probe, a primer (e.g., a PCR primer), or a diagnostic probe. Preferred hybridizing 
nucleic acids encode a polypeptide having some or all of the biological activities 
possessed by a naturally-occurring RNase P protein subunit. Such biological 
activity can be determined by functional RNase P assay as described herein. 

10 Hybridizing nucleic acids can be additional splice variants of the RNase P 

protein gene. Thus, they may encode a protein which is shorter or longer than the 
different forms of RNase P described herein. Hybridizing nucleic acids may also 
encode proteins that are related to RNase P (e.g., proteins encoded by genes which 
include a portion having a relatively high degree of identity to the RNase P genes 

1 5 described herein) . 

The invention also features vectors and plasmids that include a nucleic acid 
of the invention which is operably linked to a transcription and/or translation 
sequence to enable expression, e.g., expression vectors. 
2. RNase P Antibodies 

20 The bacterial RNase P proteins and polypeptides (or immunogenic 

fragments or analogs) can be used to raise antibodies useful in the invention, and 
such polypeptides can be produced by recombinant or peptide synthetic techniques 
(see, e.g., Ausubel et al., Current Protocols in Molecular Biology, John Wiley & 
Sons, New York, 1995). In general, the peptides can be coupled to a carrier 

25 protein, such as KLH, mixed with an adjuvant, and injected into a host mammal. 
Antibodies can be purified by peptide antigen affinity chromatography. 

In particular, various host animals can be immunized by injection with an 
RNase P protein or polypeptide. Host animals include rabbits, mice, guinea pigs, 
and rats. Various adjuvants can be used to increase the immunological response, 
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depending on the host species, including but not limited to, Freund's (complete 
and incomplete), mineral gels such as aluminum hydroxide, surface active 
substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil 
emulsions, keyhole limpet hemocyanin, dinitrophenol, and potentially useful 
5 human adjuvants such as BCG (bacille Calmette-Guerin) and Corynebacterium 
parvum. 

Antibodies within the invention include polyclonal antibodies, humanized 
or chimeric antibodies, single chain antibodies, Fab fragments, F(ab') 2 fragments, 
molecules produced using a Fab expression library, and monoclonal antibodies. 

10 Monoclonal antibodies, can be prepared using the RNase P proteins 

described above and standard hybridoma technology (see, e.g., Kohler et al., 
Nature 256: 495, 1975; Kohler et al., Eur. J. Immunol. 6: 511, 1976; Hammerling 
et al., In Monoclonal Antibodies and T Cell Hybridomas, Elsevier, NY, 1981; 
Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, New 

15 York, 1995). 

In particular, monoclonal antibodies can be obtained by any technique that 
provides for the production of antibody molecules by continuous cell lines in 
culture such as described in Kohler et al., Nature 256: 495, 1975, and U.S. Patent 
No. 4,376,1 10; the human B-cell hybridoma technique (Kosber et al., Immunology 

20 Today 4: 72, 1983; Cole et al., Proc. Natl Acad. Sci. USA 80: 2026, 1983), and the 
EBV-hybridoma technique (Cole et al., Monoclonal Antibodies and Cancer 
Therapy, Alan R. Liss, Inc., pp. 77-96, 1983). Such antibodies can be of any 
immunoglobulin class including IgG, IgM, IgE, IgA, IgD, and any subclass 
thereof. The hybridoma producing the mAb of this invention can cultivated in 

25 vitro or in vivo. The ability to produce high titers of mAbs in vivo makes this the 
presently preferred method of production. 

Once produced, polyclonal or monoclonal antibodies are tested for specific 
RNase P recognition by Western blot or immunoprecipitation analysis by standard 
methods, for example, as described in Ausubel et al., Current Protocols in 
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Molecular Biology, John Wiley & Sons, New York, 1995. Preferred antibodies 
specifically bind the RNase P proteins of the invention. 

Preferably, the antibodies of the invention are produced using fragments of 
the RNase P protein which lie outside highly conserved regions and appear likely 
5 to be antigenic, by criteria such as high frequency of charged residues. In one 
specific example, such fragments are generated by standard techniques of PCR, 
and are then cloned into the pGEX expression vector. Fusion proteins are 
expressed in E. coli and purified using a glutathione agarose affinity matrix 
(Ausubel, et al., Current Protocols in Molecular Biology, John Wiley & Sons, 
10 New York, 1995). 

Another aspect the invention features a method for detecting an RNase P 
protein. This method includes: contacting an antibody that specifically binds an 
RNase P protein of the present invention to a biological sample under conditions 
that allow the formation of RNase P-antibody complexes; and detecting the 
1 5 complexes, if any, as an indication of the presence of RNase P protein in the 
biological sample. 

3. Screening for Antibacterial Agents: Example 

The rnpA genes encoding the RNase P proteins or protein subfragments of 
the invention are amplified from genomic DNA by established PCR methods. The 

20 amplified DNA sequences that encode the RNase P protein genes are subcloned 
into expression plasmids, which contain fusion sequences allowing the subcloned 
gene to be expressed in a transformed or transfected host cell as a "tagged" fusion 
protein. E. coli cells are transformed with the plasmid DNA, protein expression is 
induced, and the overexpressed fusion protein is isolated by affinity purification 

25 according to established protocols. 

Each of the purified RNase P proteins is combined with a renatured cognate 
RNase P RNA subunit from the same, or a different, bacterial organism, under 
conditions that reconstitute enzymatic activity. It is possible to reconstitute a 
functional RNase P holoenzyme using a protein subunit and an RNA subunit from 
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different species (e.g., B. subtilis, E. coli, or S. aureus). The conditions for 
reconstitution include heat denaturing the RNA subunit then slowly cooling in a 
physiologically similar buffer. A buffer for folding the RNA component of RNase 
P is 10-50 mM Tris-HCl/MOPS/HEPES (pH=7.0-8.0), 25-500 mM KCl/NaCl/NH 4 
5 and 1-25 mM MgCl 2 . The RNA is heated to 65 °C for 5 minuotes, 55 °C for 

minutes and 37° for 5 minutes The protein is then added along with 1-10 mM DTT 
and the incubation continued at 37° C for 5 minutes. Similar heating protocols 
known in the art may also be used. The protein will then be incubated briefly with 
the renatured RNA to reconstitute holoenzyme activity. 

10 The RNase P substrates used in the assay are labelled. Examples of labeled 

nucleotides that can be incorporated into the RNA substrates include BrdUrd (Hoy 
and Schimke, Mutation Research 290: 217 ,1993), BuUTP (Wansick et al., J. Cell 
Biology 122:283, 1993) and nucleotides modified with biotin (Langer et al., Proc. 
Natl Acad. Sci. USA 78: 6633, 1981) or with suitable haptens such as 

15 digoxygenin (Kerhof, Anal. Biochem. 205: 359, 1992). Suitable fluorescence- 
labeled nucleotides are Fluorescein-isothiocyanate-dUTP, Cyanine-3-dUTP and 
Cyanine-5-dUTP (Yu et al., Nucleic Acids Res. 22:3226, 1994). A preferred 
nucleotide analog label for RNA molecules is Biotin- 14-cytidine-5 '-triphosphate. 
Fluorescein, Cy3, and Cy5 can be linked to dUTP for direct labeling. Cy3.5 and 

20 Cy7 are available as avidin or anti-digoxygenin conjugates for secondary detection 
of biotin- or digoxygenin-labeled probes. 

The amplified rnpA genes may also be cloned into expression vectors ot 
containing encoded fusion tag sequences, but still containing an inducible 
promoter. After induction, the overexpressed protein can be purified essentially by 

25 the protocol for purification of E. coli RNaseP protein (Baer et al., 1990). 

Examples of RNA substrates that can be used to measure RNase P 
enzymatic activity include the full-length substrate ptRNA Tyr (pTyr) (Altaian and 
Kirsebom, The RNA World, 2 nd Ed., Cold Spring Harbor Laboratory Press, Cold 
Spring Harbor, NY, 1999), and ptRNA Gln (pGln), an 85-mer from the 



-20- 



cyanobacterium Synechocystis (Pascual and Vioque, Proc. NatL Acad. Sci. USA 
96: 6672, 1999) or a substrate obtained from the homologous bacteria. 

A modified ptRNA Gln substrate can also be used, in which the 5 f end is 
fluorescently tagged in order to monitor hydrolysis using fluorescence 
5 spectroscopy. Given that the chemical synthesis of an 85-mer with a fluorescent 
tag is technically impractical, and the fluorescent modification enzymatically 
synthesized RNA is difficult, the preferred method of synthesizing a fluorescently 
tagged pGln is conducted with the following two steps: a 5 ! fluorescently modified 
26-nucleotide fragment is chemically synthesized and annealed to a 3* 59- 

10 nucleotide fragment that has been enzymatically synthesized. These two 
fragments, when annealed, form a full-length pGln substrate. The unligated 
junction between the two fragments occurs in the D-loop, a region that is not 
required for function by the RNase P holoenzyme. 

In addition, substrates that contain only the minimally required structural 

1 5 elements for recognition by the enzyme can also be utilized for this reaction, 

although the Km values for these substrate fragments are usually much higher than 
the above-described full-length substrates. One example of a substrate fragment is 
plOATl, a 45-mer that contains a 10-nucleotide 5 ! leader sequence, an extended 
12-base pair stem which is made up of the aminoacyl acceptor stem, a T-stem, and 

20 a single loop. The Km for hydrolysis reactions using this simplified substrate 
fragment rises to greater than (McClain et al., 1987). Therefore, while the 
substrate fragment is easier to constuct, it requires a higher concentration in an 
enzymatic assay. 

The progress of the RNase P-mediated hydrolysis reaction is monitored, for 
25 example, by fluorescence spectroscopy. For example, fluorescence polarization 
assay for RNase P activity is conducted by labeling the 5' end of the substrate, for 
example, the 45-mer (plOATl) or the 85-mer (pGln) substrate, with an appropriate 
fluorophore. Given that compounds in screening libraries often interfere with 
fluorescence measurements in the blue to yellow region of the spectrum, preferred 
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fluorophores emit light in the red region of the spectrum (e.g., TAMRA 
(Molecular Probes, OR) and Cy3 labeled nucleotide (Dharmacon Research, CO.) 
Samples of the RNase P holoenzyme and the RNase P substrate are mixed, 
incubated, and measured for spectrophotometric polarization. When the substrate 
5 is cleaved by the RNase P holoenzyme, the 10-nucleotide 5 f - leader sequence is 
released, which leads to a substantial change in the fluorescence polarization in the 
sample. (Campbell, LD. & Dwed., R.A. pp. 91-125 The Benjamin/Cummings 
Publishing Company, Menlo Park, CA (1984); Lakowicz, J.R., Plenum Press, NY 
(1983)). 

10 The preferred reaction buffer contains 50 mM Tris-HCl (pH 7.5), 100 mM 

ammonium chloride and 10 mM magnesium chloride. Concentrations of 10-100 
mM, 25-500 mM and 1-100 mM of the above, respectively, can be substituted, as 
can other buffering agents such as MOPS or HEPES, or other monovalent cations, 
such as sodium or potassium. When the assay is run in either 98 or 364-well 

1 5 polystyrene or polypropylene assay plates, there is a very significant decrease in 
the fluorescence intensity and polarization of the annealed substrate over time in 
the absence of enzyme. Various conditions have been tested to prevent the loss of 
signal with time. The preferred conditions include addition of 10-40 yug/ml 
carbonic anhydrase and 10-100 jug/m\ polyC to the buffer. Other materials, such 

20 as, 0.5-5% glycerol, 10-100 [xg/ml hen egg lysozyme, 10-50 jug/mL tRNA, or 2-10 
mM DTT can also be added to the buffer to prevent some loss of signal. 

The RNase P hydrolysis rate can also be monitored using a radiolabeled 
substrate, performing a surface proximity assay (SPA) and measuring hydrolysis 
by scintillation counting. For example, the substrate is anchored to the surface of 

25 the assay plate via a biotin-streptavidin interaction between a biotinylated 
nucleotide in the anticodon loop and a streptavidin matrix on the plate. The 
substrate is also 33 P-labelled at the 5' end. Using this method, RNase P-mediated 
hydrolysis of the 5' leader sequence results in decrease scintillation due to reduced 
proximity of the radiolabel to the scintillation-coated plate. (Brown et al., 
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FlashPlate Technology, in J.P. Devlin (Ed.), Marcel Dekker, Inc. NY pp. 317- 
328.) 

A bipartite substrate for RNase P, consisting of a t'-end Cy3 labeled 26mer 
and in vitro T7-polymerase transcribed 59mer is preferred for screening, the 
5 26mer consists of the first 26 contiguous nucleotides of the pre-tRNA substrate 
including the 10-nucleotide leader sequence. The two RNA fragments are 
annealed together under appropriate conditions of stoichiometry (59mer in 20 to 
100% excess) and temperature in a buffer system consisting of 50 mM Tris-HCl 
(pH 7.5), 100 mM ammonium chloride and 10 m magnesium chloride. Briefly, the 

10 two RNA fragments are mixed together and heated to between 65 and 80 °C for 
about 5 minutes and then slowly cooled to room temperature. 

In addition, the RNase P enzyme activity can also be measured using 
standard techniques described in the literature (see, e.g., Altaian and Kirsebom, 
Ribonuclease P, The RNA World, 2 nd Ed., Cold Spring Harbor Laboratory Press, 

15 Cold Spring Harbor, NY, 1999; Pascual and Vioque, Proc. Natl. Acad. Sci. 96: 
6672, 1999; Geurrier-Takada et ah, Cell 35: 849, 1983; Tallsjo and Kirsebom, 
Nucleic Acids Research 21: 51, 1993; Peck-Miller and Altaian, J. Mol. Biol. 221: 
1, 1991; Gopalanet al., J. Mol. Biol. 267: 818, 1997; and WO 99/11653). 
To screen for compounds that inhibit the activity of the RNase P 

20 holoenzymes of the present invention, compounds are added to a final 
concentration of 10 before the addition of substrate to the sample. A 
compound is determined to be an inhibitor if it significantly reduces RNase P 
hydrolysis as compared to the compound-free control sample. Ideally, the 
compounds identified as inhibitors selectively inhibit one of the RNase P 

25 holoenzymes of the invention without affecting other RNase P holoenzymes. Such 
inhibitors have the advantage of providing a selective antibacterial treatment that 
reduces the adverse side effects associated with killing nonpathogenic bacteria. 
Use of such selective inhibitors also reduces the risk of producing a wide range of 
resistant bacterial strains. 
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In general, extracts, compounds, or chemical libraries that can be used in 
screening assays are known in the art. Examples of such extracts or compounds 
include, but are not limited to, extracts based on plant, fungal, prokaryotic, or 
animal sources, fermentation broths, and synthetic compounds, as well as 
5 modification of existing compounds. Numerous methods are also available for 
generating random or directed synthesis (e.g., semi-synthesis or total synthesis) of 
any number of chemical compounds, including, but not limited to, saccharide-, 
lipid-, peptide-, and nucleic acid-based compounds. Libraries of genomic DNA or 
cDNA may be generated by standard techniques (see, e.g., Ausubel et al, supra) 

10 and are also commercially available (Clontech Laboratories Inc., Palo Alto, CA). 

Synthetic compound libraries are commercially available from Brandon 
Associates (Merrimack, NH) and Aldrich Chemical (Milwaukee, WI). 
Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant, 
and animal extracts are commercially available from a number of sources, 

15 including Biotics (Sussex, UK), Xenova (Slough, UK), Harbor Branch 

Oceanographies Institute (Ft. Pierce, FL), and PharmaMar, U.S.A. (Cambridge, 
MA). In addition, natural and synthetically produced libraries are produced, if 
desired, according to methods known in the art, e.g., by standard extraction and 
fractionation methods. 

20 When a crude extract is found to modulate an RNase P holoenzyme 

activity, further fractionation of the positive lead extract is necessary to isolate 
chemical constituents responsible. Thus, the goal of the extraction, fractionation, 
and purification process is the characterization and identification of a chemical 
entity within the crude extract having the modulating activities. The same assays 

25 described herein for the detection of inhibitors in mixtures of compounds can be 
used to purify the active component and to test derivatives thereof. Methods of 
fractionation and purification of such heterogenous extracts are known in the art. 
If desired, compounds shown to be useful agents for treatment are chemically 
modified according to methods known in the art. 
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Compounds which modulate an RNase P holoenzyme activity may be 
administered by any appropriate route for treatment or prevention of a disease or 
condition associated a bacterial infection. Administration may be topical, 
parenteral, intravenous, intra-arterial, subcutaneous, intramuscular, intracranial, 
5 intraorbital, ophthalmic, intraventricular, intracapsular, intraspinal, intracisternal, 
intraperitoneal, intranasal, aerosol, by suppositories, or oral administration. 

Therapeutic formulations may be in the form of liquid solutions or 
suspensions; for oral administration, formulations may be in the form of tablets or 
capsules; and for intranasal formulations, in the form of powders, nasal drops, or 
10 aerosols. 

Methods well known in the art for making formulations are found, for 
example, in "Remington's Pharmaceutical Sciences." Formulations for parenteral 
administration may, for example, contain excipients, sterile water, or saline, 
polyalkylene glycols such as polyethylene glycol, oils of vegetable origin, or 

1 5 hydrogenated napthalenes. Biocompatible, biodegradable lactide polymer, 

lactide/glycolide copolymer, or polyoxyethylene-polyoxypropylene copolymers 
may be used to control the release of the compounds. Other potentially useful 
parenteral delivery systems include ethylene- vinyl acetate copolymer particles, 
osmotic pumps, implantable infusion systems, and liposomes. Formulations for 

20 inhalation may contain excipients, for example, lactose, or may be aqueous 

solutions containing, for example, polyoxyethylene-9-lauryl ether, glycholate and 
deoxycholate, or may be oily solutions for administration in the form of nasal 
drops, or as a gel. The concentration of the compound in the formulation will vary 
depending upon a number of factors, including the dosage of the drug to be 

25 administered, and the route of administration. 

The formulations can be administered to human patients in therapeutically 
effective amounts (e.g., amounts which prevent, eliminate, or reduce a 
pathological condition) to provide therapy for a disease or condition associated 
with infection. Typical dose ranges are from about 0.1 ^g/kg to about 1 g/kg of 
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body weight per day. The preferred dosage of drug to be administered is likely to 
depend on such variables as the type and extent of the disorder, the overall health 
status of the particular patient, the formulation of the compound excipients, and its 
route of administration. 

5 Other Embodiments 

All publications and patent applications mentioned in this specification are 
herein incorporated by reference. 

While the invention has been described in connection with specific 
embodiments, it will be understood that it is capable of further modifications. 
10 Therefore, this application is intended to cover any variations, uses, or adaptations 
of the invention that follow, in general, the principles of the invention, including 
departures from the present disclosure that come within known or customary 
practice within the art. Other embodiments are within the claims. 

What is claimed is: 
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1. An isolated polypeptide comprising an RNase P consensus sequence 
wherein said polypeptide has RNase P protein activity. 

2. The polypeptide of claim 1, wherein said polypeptide comprises an 
amino acid sequence selected from the group consisting of SEQ ID NOS: 20-38. 

3. An isolated nucleic acid sequence, wherein said sequence encodes a 
polypeptide containing an RNase P consensus and said polypeptide has RNase P 
protein activity. 

4. The nucleic acid sequence of claim 3, wherein said sequence encodes a 
polypeptide comprising an amino acid sequence selected from the group consisting 
of SEQ ID NOS: 20-38. 

5. The nucleic acid sequence of claim 4, wherein said sequence is selected 
from the group consisting of SEQ ID NOS: 1-19. 

6. A transgenic host cell, wherein said cell comprises a heterologous 
nucleic acid sequence encoding the polypeptide of claim 1. 

7. An antibody that specifically binds to the polypeptide of claim 1. 

8. A method of identifying an antibiotic agent, said method comprising: 

i) obtaining an RNase P holoenzyme comprising the polypeptide of claim 1; 

ii) contacting said holoenzyme with an RNase P substrate in the presence 
and in the absence of a compound; and 

iii) measuring the enzymatic activity of said holoenzyme; 
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wherein a compound is identified as an antibiotic agent if said compound produces 
a detectable decrease in said RNase P enzymatic activity as compared to activity in 
the absence of said compound. 



9. The method of claim 8, wherein said polypeptide is substantially 
5 identical to a polypeptide of SEQ ID NOS:20-38. 

10. The method of claim 8, wherein said activity is measured by 
fluorescence spectroscopy. 

1 1 . The method of claim 8, wherein said RNase substrate is fluorescently 
tagged ptRNA Gln . 

10 12. A method for making a ptRNA Gln , said method comprising annealing 

two RNA fragments together by heating to about 65 °C to about 80 °C for about 5 
minutes, followed by cooling to 20-25° C. 

13. The method of claim 8, wherein said fluorescence analysis is carried 
out in a buffer comprising 10-40 mg/ml carbonic anhydrase and 10-100 jug/ml 

15 polyC 

14. The method of claim 13, wherein said buffer further comprises at least 
one of the following: 

0.5-5% glycerol; 
10-100 /ug/ml hen egg lysozyme; 
20 10-50 yUg/ml tRNA; or 

1-10 mM DTT. 
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NOVE L BACTERIAL RNase P PROTEINS AND THEIR USE IN 
IDENTIFYING ANTIBACTERIAL COMPOUNDS 



Abstract of the Disclosure 
The invention features novel RNase P molecules and nucleic acids encoding 
the same. Methods for discovery of antimicrobial compounds are also featured. 



50093.016001 RNase P application. wpd 
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Streptococcus mutans UAB159 (119 aa) 
Amino acid sequence; 

VLKKAYRVKSDKDFQAIFTEGRSVANRKFWYSLEK^ 
QDFWIARKGVEELDYSTMKKNLVHVLKLAKLYQEGS IREKE 
Nucleotide sequence (plus strand) ; 

AGAT TT T TGGCTT TT TCTCAT TTTATGATATAATAGTGATAAT TTAAA.TAT TGGAGTCAT CTTT TGAAAAAAGCCTA 
TCGCGTTAAAAGTGATAAAGATTTTCAGGCAATTTTTACTGAAGGACGAAGTGTTGCCAATCGGAAATTTGTTGTCT 
ATAG T TTAGAAAAAGATCAAAGTCACTATCGTGT TGGAC TTTCAGTTGGAAAAAGATTAGGAAATGC TGTCGT TAGA 
AATGCGATTAAACGAAAATTGCGCCATGTCCTTATGGAACTTGGTCCTTATTTAGGCACTCAAGATTTTGTTGTTAT 
TGCTAGAAAAGGTGTTGAGGAACTTGATTATAGCACGATGAAAAAAAATCTGGTTCATGTTTTAAAACTGGCTAAAC 
TGTATCAGGAAGGATCTATTCGTGAAAAAGAA 

Sequence origin: University of Oklahoma ACGT; Contig 299 
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Klebsiella pneumoniae M6H 78578 (119 aa) 
Amino acid sequence: 

Nucleotide sequence (plus strand) : 



^CAACGGGCTGGCACGCCGC 
?CGCCAAGAAAAACGTGAAA 

GGATTTCGTGGTGGTGGCGAAAAGAGGGGTTGCCr'"-^^ 



^^iHHEE™^— ===== 



GACCTCGATAACCGTGCTCTCTCGGAAGCGTTGGAAAAATT. 



'AT 



Sequence origin: Washington University; Contig 632 



FIG. 2B 

Salmonella paratyphi A ATCC 9150 (110 aa) 
Amino acid sequence: 

Nucleotide sequence (plus strand) : 

======= 

S^CCGACCTCC^TAACCGTGCTCTCTC^^ 
GGTCCTGA^CCTTATTCGGGTC^ 

Sequence origin: Washington University; «»*>h,(,iijl 
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Pseudcmozias aeruginosa PA01 (135 aa) 
Amino acid sequence: 

WSRDFDRDKRLLTARQFSAVFDS PTGKVPGKHVLLLARENGLDHPRLGLVI GKKNVKLAVQRNRLKRLIRE S FRHN 

QETLAGWDIWIARKGLGELENPELHQQFGKLWKRLLRNRPRTESPADAPGVADGTHA 

Nucleotide sequence (plus strand) : 

TCTGTCGCGTCGTCGCGCCAAAGGCCGTAAGCGTCTGACCGTCTGATTTATCCGGTACGGGrGGTGAGTCGGGACTT 

CGACCGGGACAAGCGTCTACTGACAGCCCGGCAATTCAGCGCAGTCTTCGACTCTCCGACCGGCAAGGTCCCCGGCA 

AGCACGTCCTGCTGCTGGCGCGCGAGAACGGTCTCGATCACCCCCGCCTGGGCCTGGTGATCGGCAAGAAGAACGTC 

AAGCTCGCCGTCCAGCGCAATCGCCTCAAACGCCTGATCCGCGAATCGTTCCGCCATAACCAGGAAACCCTGGCTGG 

CTGGGATATCGTGGTGATCGCGCGCAAAGGCCTGGGCGAACTGGAAAATCCGGAGCTGCACCAGCAGTTCGGCAAGC 

TCTGGAAACGCCTGTTGCGCAATCGACCTCGCACGGAAAGCCCTGCTGACGCCCCTGGCGTGGCCGACGGTACTCAT 

GCATAGGTCGATGCCCGCGCATCCCGATCCCTGTAGTGTCATCCCCCCTTCGATGACCCGGCACCG 

Sequence origin: Pathogenesis & University of Washington; Contig 54 
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Corynebacterxwo. dlphtherlae (129 aa) 
Amino acid sequence: 

VTLTS SNRT TVLPS QHKLSNSEQFRAT IRKGKRAGRS T WLHFYAEATAGNLATAGGPRFGLWS KAVGNAVTRHRV 
SRQLRHWIAMKDQFPAS SHVWRAI PPAATAS YEELRADVQAALDKLNRKR 
Nucleotide sequence (plus strand) : 

CCGGTCGCGCAATCGTGGCTGCACGTCGTAACAAGGGTCGTAAGAGCCTGACCGCTTAAG gTC ACTCTTACAAGCTC 
GAATAGAACGACGGTGCTACCTTCACAGCACAAGCTCAGCA^^ 

AGCGTGCTGGGAGGAGCACCGTCGTTCTTCATTTTTATGCTGAGGCGACCGCGGGCAACCTTGCAACCGCAGGCGGC 

CCGCGATTCGGCCTCGTTGTGTCCAAGGCTGTTGGAAATGCTGTGACTCGTCACCGTGTTTCGCGGCAGTTAAGGCA 

CGTAGTAATCGCTATGAAAGACCAGTTCCCAGCGTCATCCCATGTTGTTGTGAGGGCGATACCGCCAGCGGCGACAG 

CAAGTTATGAGGAGTTGCGGGCAGATGTGCAGGCAGCACTCGACAAGCTCAACCGCAAGCG ATAA GGCGGTTACTCG 

CCCTCGTGGGCTGGTTAGTCGCGCATTGTTTGATGCGGTGCGGTTCTA 

Sequence origin: Sanger centre; Contig 390 
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Chlamydia trachomatis MoPn (119 aa) 
PNLPACQVWSPKGGTLPNFGKLSADLLKHIPEALPLVTSSK 

Nucleotide sequence (plus strand) : TCACGGCAGAC ATTCCTTAATTGATCTCTAAGATCT 
GCTACAAAAAGTGGAAGAAATCTTTTAAATCGTCGTC 

TTCATTTGIPGCATCGGTTAACT^ 

GTGGGCAATATTGTCGTACTCmTCAGGCAACTTTAC^ 
GTTACTGTTTCTAAAAAATTTGGGAAAGCCCATCAGCGCAATC 

AAATAAAAAACCATTCCACGCTATAGAGGCATGGAATGGGAA 
Sequence origin: TIGR « Manitoba University; 
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Vibrio cholerae serotype Ol, Biotype El Tor, Strain K16961 (122 aa) 

Amino acid sequence: 

SRIILSTYAFNRELRLLTPEHYQKVFQQM 

RLHQNQLANKDFWIAKKSAQDLSNEELFNLLGKLWQRLSRPSRG 

Nucleotide sequence (minus strand) : *N0 INITIATOR CODON BEFORE STOP* 
GGCAGCGT GGGCCGATAAGTGGAC TAATAAACCAC TGGTAAAG T T T TACAATACCAATGGCTAACCACGAGAAGGGC 
GAGAGAGGCGTTGCCATAGTTTGCCAAGCAAGTTAAACAGTTCTTCATTGCTCAAATCTTGCGCGCTCTTTTTGGCG 
AT GACAACAAAAT C T T TGT T AGCCAG T T GATT T TGAT G TAAGCGAAAGC T T T C T C T GCAAAT ACGTT TGAATCGAT T 
ACGGCCGACGGCAGTT T T GAT C TGC T T T T TAGGAACCGCGAGTC CCAAACGAGGATGAGAAAGGTTAT TAGCGCGAG 
CGAT GAT T GTGAGATGAGGAGAACCAGCAC TGT GAGCT T GCTGGAAGAC T T T T T GATAAT GT TCGGGAGT TAACAAA 
CGTAACTCCCGATTGAATGCGTACGTACTCAAAATAATTCGAGATTATTTTGACAGGCGCTTACGGCCTTTTGCACG 
ACG T GCAT TCAGAACT T TACGACCGT TCGC 
Sequence origin: TIGR 
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Neisseria gonorrhoea FA 1090 (123 aa) 
Amino acid sequence: 

VILDYRFGRQYRLLKTDDFSSVFAFRNRRSRDLLQVSRSNGNGLDHPR 
NKNRL PPQD FWRVRRKFDRAT AKQARAE LAQLM FGNPAT GCGKQV 
Nucleotide sequence (minus strand) : 

ATGTTCCTTGTATGGGAAACCCGTTGCCGTCTGAACCTTGCCTGCAGGGTACCGTTCTGATCATACCTGTTTCCCGC 
ATCCGGTTGCGGGGTTGCCGAACATGAGTTGTGCCAGTTCCGCCCTTGCCTGTTTTGCGGTAGCCCTGTCGAATTTC 
CGGCGGACGCGCACGACGAAATCCTGAGGCGGCAGCCGGTTTTTGTTCAATCTGAACCAGTCGCGGATGACGCGTTT 
CATATAGTTCCGCTCGTTGGCGCGTTTGGCGGTTTTTTTGCCGACCACCAGACCGATGCGGGGATGGTCCAGCCCGT 
TGCCGTTTGAGCGCGAAACTTGCAGCAGGTCGCGGCTGCGGCGGTTTCTGAATGCAAAAACGGATGAAAAATCATCC 
GTTTTTAACAAGCGGTACTGCCTTCCGAAGCGGTAGTCCAAAATrACACTGCCAGGCGTTTGCGGCCTTTGGCACGG 
CGTGCGGCCAATACTGCGCGTCCGCCGCGT 

Sequence origin: University of Oklahoma ACGT; Contig 60 



FIG. 2H 



Neisseria meningitidis serogroup A Strain Z2491 (123 aa) 
Amino acid sequence: 

VILDYRFGRQYRLLKTDDFSSVFAFRNRRSRDLLQVSRSNGN 
NKNRL PPQD FWRVRRK FDRATAKQARAE LAQLMFGNPATGCRKQA 
Nucleotide sequence (minus strand) : 

TGTTCCTTAGTATGGGAAACCCGTTGCCGTCTGAACCTTGCCTGCAGAGTACCGTTCTGATCATGCCTGTTTCCTGC 
ATCCGGTTGCGGGGTTGCCGAACATGAGTTGTGCCAGTTCCGCCCTTGCCTGTTTTGCGGTAGCCCTGTCGAATTTA 
CGGCGGACGCGCACGACGAAATCCTGCGGCGGCAGCCGGTTTTTGTTCAATCTGAACCAGTCGCGGATGACGCGCTT 
CATATAATTTCGTTCGTTGGCGCGTTTGGCGGTTTTTTTGCCGACCACCAGACCGATGCGGGGATGATCCAGCCCGT 
TGCCGTTTGAACGCGAAACTTGCAGCAGGTCGCGGCTGCGGCGGTTTCTGAATGCAAAAACGGATGAAAAATCATCC 
GTTTTCAACAAGCGGTACTGCCTTCCGAAGCGGTAGTCCAAAATTACACCGCCAGGCGTTTGCGGCCTTTGGCGCGC 
CGTGCGGCCAATACTGCGCGTCCGCCGCGC 

Sequence origin: Sanger centre & Oxford University 
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Streptococcus pyogenes Ml (113 aa) 
Amino acid sequence: 

VKREKDFQAI FKDGKSTANRKFVI YHLNRGQDHFRVGI SVGKKI GNAVTRNAVKRKIRHVIMALGHQLKSEDFW1A 
RKGVE S LE YQE LQQNLHHVLKLAQLLE KG FESEEKH 
Nucleotide sequence (minus strand) : 

G T TACC T CACCACGACCACAGGC CAC TAATAATAGAAC TAAGGGGACT AT TC T TGCAAT T TTAATG T T T T T C T T CAC 
TCTCAAAACCTTTCTCAAGCAATTGTGCTAACTTTAAAACATGATGTAAATTTTGTTGAAGCTCTTGATACTCCAAA 
GAT T CGACACC CT TACGGGCAAT CACCACGAAATCCTC T GAC T TCAGCTGATGCCC TAAT GCCATGATAACAT GACG 
TAT CTTTCGTTT GAC T GC AT T T C T GGTGAC TGCAT T TCC T AT T T T T TT ACCGAC AGAAATAC C CAC AC GGAAGT GGT 
CTTGGCCTCTATTTAAATGATAAATGACAAATTTTCGATTTGCTGTACTTTTTCCATCCTTAAATATGGCTTGGAAA 
T C T T TC T CACGC T T G&CACGATAGGTC T T C TT CAAAAT T TAACTCCAAT AT C TAAAT TAT TACCAT T AT ACCAC ATC 
Sequence origin: University of Oklahoma ACGT; Contig 7 
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Bordetella pertussis Tohama I (123 aa) 
Amino acid sequence: 

MPRATLPAEARLHRPSE FAAALKGRRLARGAFFI VSAS PCAPADDQPARARLGLVIAKRFAARAVTRNTLKRVI RE A 

FRARRLALPAQDYVVRLHSKLTPASLTALKRSARAEVDAHFTRIAR 

Nucleotide sequence (minus strand) : 

CCACCCAGGGGCTGAGGAAGTACCGGTAAAACCGGATCGGGGCGATAAGCAGTCTCCTGATCATCGCGCTATCCGTG 
TGAAGTGAGCATCTACTTCGGCGCGCGCCGAGCGTTTCAGGGCCGTGAGGCTTGCCGGTGTCAGCTTGCTGTGCAGC 
CGCACCACGTAATCCTGGGCCGGCAGGGCAAGCCGGCGAGCCCGGAACGCTTCGCGGATGACCCGCTTCAAGGTATT 
GCGCGTCACGGCGCGGGCGGCAAAACGCTTGGCGATCACCAGGCCCAGGCGCGCGCGCGCCGGCTGGTCATCAGCAG 
GGGCACAGGGCGAGGCGCTGACAATAAAGAAAGCCCCTCGGGCCAGTCGCCGGCCTTTGAGGGCGGCGGCAAACTCG 
GAGGGGCGATGCAATCGCGCCTCCGCAGGGAGCGTGGCGCGCGGCATCGGTGACGTGACGGAGACTGGCGACGGGGC 
CGGCGGCGATGCTCCTGTTACAGGCAATCC 

Sequence origin: Sanger centre & MDS; Contig 267 
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Porphyromozxas glnglvalls W83 (137 aa) 
Amino acid sequence: 

Nucleotide sequence (minus strand) • 



TAAAC GAC AAAC 



Sequence origin: TIGR & Forsyth Dental Cent 



:er 
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Streptococcus pneumoniae Type 4 (124 aa) 
Amino acid sequence: 

VLKKNFRVKREKDFKAIFKEGTSFANRKFVVYQLENQKNRFRV 
DVDFVVI ARKGVE TLG YAEMEKNLLHVLKLSKI YREGNGSEKE TKVD 
Nucleotide sequence (minus strand) : 

TCGCTAGTTACCCCAT TAG TCGCACAGGC TGT CATGATTAACAGAGACAGTCCTAGCAAACTAGTCAAC T TTAGTT T 
CTTTTTCACTCCCATTTCCTTCCCGGTAAATCTTTGATAATTTTAATACATGGAGTAGATTTTTCTCCATCTCTGCG 
TATCCCAAGGTTTCGACTCCTTTTCGAGCAATGACAACAAAGTCGACATCTTCTACCAGACTCCCTTTTGCATTCTG 
GAT AATATGC CGAAT CC GT C GC T TAATT TGAT T T CT AG T GACGGCAT T CCCCAGT T T T T T GCTAACTGATAGACCT A 
CTCGAAAACGGTTTTTCTGGTTTTCTAATTGGTAGACCACAAATTTGCGATTAGCAAAACTTGTCCCCTCCTTGAAA 
ATCGCCTTAAAATCTTTCTCTCTTTTTACACGAAAGTTTTTCTTCAAAftCTCAACTCCATCTATTAAATTACTACTA 
T TATACCATATT T TTCAAAAAAGCCAATCATAG 
Sequence origin: TIGR; 



Clostridium difficile 630 (epidemic type X) (114 aa) 
Ainino acid sequence: 

MDFNRTKGLKKDSDFRKVYKHGKS FAl^KyLVIYILKNKSDYSRVGISVSKKVGKAI TRNRVRRLIKEAYRLNIDEKI 
KPG YD I VFI ARVS S KDAT FKD I DKS I KNLVKRTDI S I 
Nucleotide sequence (minus strand) : 

TCC T T TAATATATAAATTATT T TATTCAAAGTCATTAACCTCCATATT TATAGCATACA ATTA AATAGAAATATCCG 
T T C T T T TAAC TAAAT T TT TTATAGAC T TGTC TAT GT CT T TAAAAGTAGCAT CC TT AC TAGATACCC T TGCTATAAAT 
AC TATAT CATAT CCAGGC T TAATT T T T TCAT CAATAT T TAATC TG TAGGCT TC T T T TAT TAATC TTC TTAC TC TATT 
CC T AGTAATAGCT TT T CC TACT t t TT T T GAAACAGAAATACC T AC T C T AC TATAAT C T GAT T TAT T T TTAAGTATAT 
ATAT TAC TAAAT AT T T GT T TGCAAAAGAT T TGCC GT GT T TATATAC T T T TC TAAAAT CAGAGTC T T T T T TCAACCCT 
T TAGTC CTAT TAAAG TCCAZAGTTAACC T CCATAAACAC AGC T AT GAATC GTAAT TAT T T ACACAAAAAGGCCAC CT 
TTG 

Sequence origin: Sanger centre; Contig 975 
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Cavphylobacter jejvml NCTC (108 aa) 
Amino acid sequence: 
VKNFDKFS TNEE FSS VYKVGKKWHCEGVI I FYLNS 
IFVAKNE I TELS FSRLEKNLKWGLKKLECFK 
Nucleotide sequence (minus strand) : 
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Bacillus anthracxs Ames {119 aa) 
Amino acid sequence: 
MKKKHRIKKNDEFQTVFQKGKSNA^ 

GKDFVI IARKPCAEMTYEELKKSLIHVFKRSGMKRIKSSVRK 
Nucleotide sequence (minus strand) : 

TAARCCTAATTTCTTTTTCAAAGCCTACTCCTCCTTGTATCGGTATGTATATAGTGTAATTCATTTCCTTACGCTAC 

TTTTTATTCTTTTCATACCAGAGCGTTTAAAGACATGAATTAAGCTTTTCTTTAATTCTTCATATGTCATCTCTGCA 

CAAGGCTTCCTTGCTATTATAACAAAATCTTTTCCAGAATCTATCTCATCTTTTAATTCTGTGATCGACTGGCGAAT 

CATACGT T TAAT T CGGT TAC GCACTACTGCAT TTCC T AT C T TC T TGC TGACAGAAAGGCCAATACGAAAGT T T GGCT 

GCTCTTCTTTATCTAGTTGATAGACAACAAATTGACGATTCGCATTCGATTTTCCTTTTTGAAAAACCGTCTGGAAT 

TCATCATTCTTTTTTATACGATGTTTTTTCTTCA2ATCAATTGACACTCCTGTAGTTCATCAGCGGAAATTCACTAT 

TATTAGAAAAAAAGACCA 

Sequence origin: TIGR; 
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Mycobacterium arium 104 (119 aa) 
Amino acid sequence: 

VLPARNRMTRS TE FDAT VKHGTRMAQPD I WHLRRDS E PDDES AGPRVGLWGKAVG T AVQRHRVARRLRHVARALL 

GELEPSDRLVIRALPGSRTASSARLAQELQRCLRRMPAGTGP 

Nucleotide sequence (minus strand) : 

GTCCGCGGGCGACGGTTCGGCCGGCGCCGCGAATGGCCGCGCCCGACCGCGCCGGTCCGGTCACGGCCCGGTTCCCG 

CCGGCATGCGCCGCAGGCACCGCTGCAGTTCCTGCGCCAGGCGCGCCGACGACGCGGTCCGGCTTCCGGGCAGCGCG 

CGAATCACCAGCCGGTCGGATGGTTCGAGTTCGCCGAGCAGGGCCCGGGCCACGTGACGCAGCCGGCGGGCCACGCG 

GTGTCGTTGCACCGCCGTCCCGACGGCCTTCCCGACGACCAGCCCGACCCGTGGGCCCGCGGATTCGTCGTCGGGTT 

CGGAGTCGCGCCGGAGGTGGACGACGATGTCGGGCTGCGCCATGCGGGTTCCGTGCTTCACCGTCGCGTCAAACTCG 

GTTGACCGCGTCATGCGGTTGCGTGCGGGAAGCACCGCGAAAGACCTGACGTGCGATCAGGCAGAGAGCGCGCGGCG 

ACCCTTGCGGCGCCGACC 

Sequence origin: TIGR; 
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Staphylococcus aureus NCTC 8325 (117 aa) 
Amino acid sequence: 

MLLEKAYRIKKNADFQRIYKKGHSVANRQFVVYTCNNKEIDH 
LAKDIIVIARQPAKDMTTLQIQNSLEHVLKIAKVFNKKIK 
Nucleotide sequence (plus strand) : 

GTTATAAGCTCAATAGAAGTTTAAATATAGCTTCAAATAAAAACGATAAATAAGCGAGTGA^CTTATTGGAAAAAGC 
TTACCGAATTAAAAAGAATGCAGATTTTCAGAGAATATATAAAAAAGGTCATTCTGTAGCCAACAGACAATTTGTTG 
TATACACTTGTAATAATAAAGAAATAGACCATTTTCGCTTAGGTATTAGTGTTTCTAAAAAACTAGGTAATGCAGTG 
TTAAGAAACAAGATTAAAAGAGCAATACGTGAAAATTTCAAAGTACATAAGTCGCATATATTGGCCAAAGATATTAT 
TGTAATAGCAAGACAGCCAGCTAAAGATATGACGACTTTACAAATACAGAATAGTCTTGAGCACGTACTTAAAATTG 
CCAAAGTTTTTAATAAAAAGATTAAGTAAGGATAGGGTAGGGGAAGGAAAACATTAACCACTCAACACATCCCGAAG 

TCTTACCTCAGA 

Sequence origin: University of Oklahoma ACGT; Contig 561 
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Staphylococcus aureus COL (117 aa) 
SleLyrikknaotqS 

LAKDI IVIARQPAKDMTTLQIQNSLEHVLKIAKVFNKKIK 

Nucleotide sequence (plus strand) : „„„„„„ 
GTTATAAGCTCAATAGAAGTTTAAATATAGCTTCAAATAAAAACGATAAATAAGCGAGTGA3CTTATTGGAAAAAGC 

TTACCGAATTAAAAAGAATGCAGATTTTCAGAGAATATATAAAAAAGGTCATTCTGTAGCCAACAGACAATTTGTTG 

TATACACTTGTAATAATAAAGAAATAGACCATTTTCGCTTAGGTATTAGTGTTTCTAAAAAACTAGGTAATGCAGTG 

T T AAGAAACAAGATT AAAAGAGCAATACGT GAAAATT TCAAAGTACATAAGT CGCATATAT T GGCCAAAGAT AT TAT 
TGTAATAGCAAGACAGCCAGCTAAAGATATGACGACTTTACAAATACAGAATAGTCTTGAGCACGTACTTAAAATTG 
CCAAAGTTTTTAATAAAAAGATTAAGTAAGGATAGGGTAGGGGAAGGAAAACATTAACCACTCAACACATCCCGAAG 

TCTTACCTCAGA 

Sequence origin: TIGR; 
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